Korean prosody generation and artificial neural networks
نویسندگان
چکیده
To hear more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules in Korean language. We can extract these rules from linguistic, phonetic knowledge or by analyzing real speech. In general, all of these rules are integrated into a prosody-generation algorithm in TTS. But this algorithm cannot cover all the possible prosodic rules in one language and it is not perfect, so the quality of synthesized speech cannot be as good as we expect. So we propose artificial neural networks(ANNs) that can learn the prosodic rules in Korean language. Multi-Layer Perceptron(MLP) using an error Back Propagation(BP) algorithm had been selected as ANNs for this study. To train and test these ANNs, we made a corpus that consists of some meaningful sentences that were made from a corpus of phonetically balanced(PB) isolated words. These sentences were read by one male speaker, recorded, and collected as a speech database. We had analyzed recorded speech to extract prosodic information of each phoneme, and made target and test patterns for artificial neural networks. We found out that ANNs could learn the prosody from real speech and generate the prosody of a sentence when it was given to ANNs.
منابع مشابه
Number of output nodes of artificial neural networks for Korean prosody generation
We’d been studying artificial neural networks(ANNs) that can learn and generate the prosody of a Korean sentence. To hear more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules about Korean language and integrate all of these rules into an algorithm. We can get these rules from linguistic, phonetic knowledge or by analyzi...
متن کاملProsody generation with a neural network: weighing the importance of input parameters
As an alternative to synthesis-by-rule, the use of neural networks in speech synthesis has been successfully applied to prosody generation, yet it is not known precisely which input parameters are responsible for good results. The approach presented here tries to quantify the contribution of each input parameter. This is done first by comparing the mean errors of networks trained with only one ...
متن کاملGENERATION OF MULTIPLE SPECTRUM-COMPATIBLE ARTIFICIAL EARTHQUAKE ACCELEGRAMS WITH HARTLEY TRANSFORM AND RBF NEURAL NETWORK
The Hartley transform, a real-valued alternative to the complex Fourier transform, is presented as an efficient tool for the analysis and simulation of earthquake accelerograms. This paper is introduced a novel method based on discrete Hartley transform (DHT) and radial basis function (RBF) neural network for generation of artificial earthquake accelerograms from specific target spectrums. Acce...
متن کاملDuration Control by Asymmetric Causal Retro-Causal Neural Networks
The generation of pleasant prosody parameters is very important for speech synthesis. A prosody generation unit can be seen as a dynamical system. In this paper sophisticated time-delay recurrent neural network (NN) topologies are presented which can be used for the modeling of dynamical systems. Within the prosody prediction task left and right context information is known to influence the pre...
متن کاملDuration Modeling for Ar Synthesi
Duration modeling is a fundamental task of prosody generation for Text To Speech (TTS) systems. The objective of this task is to predict the duration of a speech unit from its phonological representation. Duration modeling has a significant influence on the intelligibility and the naturalness of the synthesized speech. This paper presents a Neural Network (NN) based approach to predict the dura...
متن کامل